Semantic Annotating of Czech Corpus via WSD

نویسنده

  • Robert Král
چکیده

We would like to describe the relationship between word sense disambiguation (WSD) and language resources (LR) working with word senses. We discuss the problem of sense division and tagging. Exploiting specific features of the inflectional languages for WSD is encouraged. We present WSD methods for Czech ambiguous nouns. The advantage of these methods consists in reducing the manual work by using a synonym training set. They could be utilized for building a semantically annotated corpus or gaining glosses for Czech WordNet.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inforex - a web-based tool for text corpus management and semantic annotation

The aim of this paper is to present a system for semantic text annotation called Inforex. Inforex is a web-based system designed for managing and annotating text corpora on the semantic level including annotation of Named Entities (NE), anaphora, Word Sense Disambiguation (WSD) and relations between named entities. The system also supports manual text clean-up and automatic text pre-processing ...

متن کامل

Annotating WordNet

High-quality lexical resources are needed to both train and evaluate Word Sense Disambiguation (WSD) systems. The problem of ambiguity persists even in limited domains, thus the necessity for wide-coverage inventories of senses (dictionaries) and corpora sense-tagged to them. WordNet has been used extensively for WSD, for both its broad coverage and its large network of semantic relations. In t...

متن کامل

Word Sense Disambiguation Corpora Acquisition via Confirmation Code

Word Sense Disambiguation (WSD) is one of the fundamental natural language processing tasks. However, lack of training corpora is a bottleneck to construct a high accurate all-words WSD system. Annotating a large-scale corpus by experts costs enormous time and financial resources. Human Computation is a novel idea for integrating human resources behind the Web, which has been wasted, to solve p...

متن کامل

Enriching WordNet with Derivational Subnets

In this paper, we deal with the derivational (word formation) relations as they are handled by the Czech morphological module Ajka. First, we show that they represent empirically well-based semantic relations forming small semantic networks, and then we solve the problem how to integrate them into lexical database such as (Czech) WordNet. In this respect we examine the relation between the deri...

متن کامل

Building A Training Corpus For Word Sense Disambiguation In English-To-Vietnamese Machine Translation

The most difficult task in machine translation is the elimination of ambiguity in human languages. A certain word in English as well as Vietnamese often has different meanings which depend on their syntactical position in the sentence and the actual context. In order to solve this ambiguation, formerly, people used to resort to many hand-coded rules. Nevertheless, manually building these rules ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004